计算机与现代化

• 算法分析与设计 • 上一篇    下一篇

基于随机抽样的加速K-均值聚类方法

王秀华   

  • 收稿日期:2013-09-17 修回日期:1900-01-01 出版日期:2013-12-18 发布日期:2013-12-18

A Speeding K-means Clustering Method Based on Sampling

  • Received:2013-09-17 Revised:1900-01-01 Online:2013-12-18 Published:2013-12-18

Abstract: To solve problems that traditional K-means clustering algorithm can not solve the large scale dataset clustering, this paper presents a speeding K-means clustering method based on random sampling, called Kmeans_RS clustering algorithm. The working set is selected from the original clustering dataset by random sampling and the traditional K-means clustering method is executed on this working set. Then the center and radius of every cluster is computed and the sampling result is obtained. The last clustering result of all dataset is obtained by measuring the relationship of sampling result and other data to cluster the remaining data. The random sampling way is used in this process and the size of K-means clustering is decreased, so the clustering efficiency is improved largely and it can be used to solve the large scale clustering problems. Simulation results demonstrate that the excellent clustering efficiency is obtained by this parallel speeding K-means method.

Key words: K-means clustering, random sampling, center, radius, working set, efficiency

中图分类号: